1.6 Pessimistic Bias

1.3 Resubstitution Validation and the Holdout Methodで頭出しされた2つの問題について

1. 独立性の侵害とサブサンプリングによるクラス割合の変化（1.4 Stratification）

the violation of independence and the changing class proportions upon subsampling

2. データセットのサブサンプリング（で訓練セットが小さくなっていること）による、モデルのcapacityの問題（1.5 Step 4）

悲観的なバイアス + 解消できるがその場合汎化性能が分からなくなる

この節で見ていく（インデントを戻す）

Pessimistic Bias（holdout 2つ目の問題）

If a model has not reached its capacity, the performance estimate would be pessimistically biased.

「モデルがcapacity（容量）に達していない場合、汎化性能は悲観的なバイアスを伴って見積もられる」

もっとデータを与えればよりよいモデルになる状態

To address this issue, one might fit the model to the whole dataset after estimating the generalization performance (see Figure 2 step 4)

「悲観的なバイアスの問題に対処するため、汎化性能を見積もったあとで全データセットでモデルを訓練するかもしれない」

テストセットを使ってしまったので、全データセットで訓練したモデルの汎化性能を見積もれない

（汎化性能は上がったと期待されるが、どれくらい上がったかを測るすべがない）

but we should be aware that our estimate of the generalization performance may be pessimistically biased if only a portion of the dataset, the training dataset, is used for model fitting

「データセットの一部だけ（、すなわち訓練セット）モデルの訓練に使われるならば、汎化性能の見積りは悲観的になるかもしれないことを認識すべき」